NADA: A Robust System for Non-referential Pronoun Detection

نویسندگان

  • Shane Bergsma
  • David Yarowsky
چکیده

We present Nada: the Non-Anaphoric Detection Algorithm. Nada is a novel, publicly-available program that accurately distinguishes between the referential and non-referential pronoun it in raw English text. Like recent state-of-the-art approaches, Nada uses very large-scale web N-gram features, but Nada makes these features practical by compressing the N-gram counts so they can fit into computer memory. Nada therefore operates as a fast, stand-alone system. Nada also improves over previous web-scale systems by considering the entire sentence, rather than narrow context windows, via long-distance lexical features. Nada very substantially outperforms other state-of-the-art systems in nonreferential detection accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributional Identification of Non-Referential Pronouns

We present an automatic approach to determining whether a pronoun in text refers to a preceding noun phrase or is instead nonreferential. We extract the surrounding textual context of the pronoun and gather, from a large corpus, the distribution of words that occur within that context. We learn to reliably classify these distributions as representing either referential or non-referential pronou...

متن کامل

The Referential Versus Non-referential Use of the Neuter Pronoun in Dutch and English

This paper discusses a corpus-based investigation of the distribution of the thirdperson neuter singular pronoun in Dutch (“het”). We labeled all pronominal occurrences of “het” in a large corpus of documents. On the basis of the annotated corpora, we developed an automatic classification system using machine learning techniques to distinguish between the different uses of the neuter pronoun. A...

متن کامل

Disambiguation of the Neuter Pronoun and Its Effect on Pronominal Coreference Resolution

Coreference resolution, determining the appropriate discourse referent for an anaphoric expression, is an essential but difficult task in natural language processing. It has been observed that an important source of errors in machine-learning based approaches to this task, is the wrong disambiguation of the third person singular neuter pronoun as either referential or non-referential. In this p...

متن کامل

Supervised Ranking for Pronoun Resolution: Some Recent Improvements

A recently-proposed machine learning approach to reference resolution — the twin-candidate approach — has been shown to be more promising than the traditional single-candidate approach. This paper presents a pronoun interpretation system that extends the twin-candidate framework by (1) equipping it with the ability to identify non-referential pronouns, (2) training different models for handling...

متن کامل

How Far Are We From (Semi-)Automatic Of Anaphoric Links In Corpora?

The paper raises for discussion a proposal for the semi-automatic annotation of pronoun-antecedent pairs in corpora. The proposal is based on robust knowledge-poor pronoun resolution followed by post-editing. The paper is structured as follows. The introduction comments on the fact that automatic identification of referential links in corpora has lagged behind in comparison with similar lexical...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011